Tangent-3 at the NTCIR-12 MathIR Task
نویسندگان
چکیده
We present the math-aware search engine Tangent-3 and report its results for the NTCIR-12 MathIR task. Tangent uses a federated search over two indices: 1) a TF-IDF textual search engine (Solr), and 2) a query-by-expression engine. We use an inverted index to store math expressions using pairs of symbols extracted from a Symbol Layout Tree representation built from Presentation MathML. We use a cascade model with two stages for retrieval. In the first stage, relevant expressions are retrieved quickly using iterator trees over posting lists to find matches and expressions are ranked using the Dice coefficient of matched symbol pairs. In the second stage, the top-k best candidates are reranked with a more strict similarity metric supporting unification and wildcard matching. Our system produces relevant (and partially relevant) Precision@5 values of 21% (50%) for the main arXiv task, 25% (49%) for the Main Wikipedia subtask and 45% (84%) for the Wikipedia Formula Browsing subtask.
منابع مشابه
NTCIR-12 MathIR Task Overview
We present an overview of the NTCIR-12 MathIR Task, dedicated to information access for mathematical content. The MathIR task makes use of two corpora. The first corpus contains excerpts from technical articles in the arXiv, while the second corpus contains English Wikipedia articles. For each corpus, there were two subtasks. Three subtasks contain queries with keywords and formulae (arXiv-main...
متن کاملExploring the One-brain Barrier: A Manual Contribution to the NTCIR-12 MathIR Task
This paper compares the search capabilities of a single human brain supported by the text search built into Wikipedia with state-of-the-art math search systems. To achieve this, we compare results of manual Wikipedia searches with the aggregated and assessed results of all systems participating in the NTCIR-12 MathIR Wikipedia Task. For 26 of the 30 topics, the average relevance score of our ma...
متن کاملMCAT Math Retrieval System for NTCIR-12 MathIR Task
This paper describes the participation of our MCAT search system in the NTCIR-12 MathIR Task. We introduce three granularity levels of textual information, new approach for generating dependency graph of math expressions, score normalization, cold-start weights, and unification. We find that these modules, except the cold-start weights, have a very good impact on the search performance of our s...
متن کاملMath Indexer and Searcher under the Hood: Fine-tuning Query Expansion and Unification Strategies
This paper summarizes the experience of Math Information Retrieval team of Masaryk University (MIRMU) with the NTCIR-12 MathIR arXiv Main Task and its subtasks. We based our approach on the MIaS system. Based on NTCIR-11 Math-2 Task relevance judgements, we developed an evaluation platform. Using this platform we rigorously evaluated combinations of new features and picked the most promising on...
متن کاملExploring the One-brain Barrier: a Manual Contribution to the NTCIR-12 Math Task
This paper compares the search capabilities of a single human brain supported by the text search built into Wikipedia with state-of-the-art math search systems. To achieve this, we compare results of manual Wikipedia searches with the aggregated and assessed results of all systems participating in the NTCIR-12 MathIR Wikipedia Task. For 26 of the 30 topics, the average relevance score of our ma...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016